Clustering and dendrogram

Sex seems to be the determinng factor in hierarchical clustering. Neither disease status nor ethnicity seem to be clustered in any meaningful manner. Also, one sample seems to have a mismatched ‘sex’ label.

Heatmap

Heatmap with the same clustering. Highly distant groups in columns are separated by sex. Also, only a certain number of most variable CpGs are displayed, a part of which separate the sexes.

PCA

Screeplot

It would take 119 principal components to capture 90% of variance in the data.

Pair plot

Sex by colors. Disease status by shape. The sample with a mismatched ‘sex’ label is visible here too.

PC Heatmap

The heatmap of principal component scores doesn’t reveal much.

PC relation with metadata

Sex

Disease Status

Age

The lighter the point, the higher the age.

Ethnicity

It seems that only sex is associated with any of the first 5 PCs.